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Traditionally, parallel discrete-event simulators based on the Time Warp synchronization 
protocol have been implemented using either the shared memory programming model or 
the distributed memory, message passing programming model. This was because the 
preferred hardware platform was either a shared memory multiprocessor workstation or a 
network of uniprocessor workstations. However, with the advent of "clumps" (cluster of 
shared memory multiprocessors), a change in this dichotomous view becomes ... 
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The last several years' of work in the area of knowledge-based systems has resulted in a 
deeper understanding of the potentials of the current generation of ideas, but more 
importantly, also about their limitations and the need for research both in a broader 
framework as well as in new directions. The following ideas seem to us to be worthy of 
note in this connection. 

^ A.mMb.od for mapping H 
^ Keiiy L. Spicer, David A. Umphress 

October 1991 ACM SIGAda Ada Letters, Volume xi issue 9 

Publisher: ACM Press 

Full text available: 'g.prifiSllJaM}. Additional Information: MLdMloQ. a^stnact, injl^xlerms 

Design reuse has more potential for increasin g the productivity of software development 
and maintenance than does traditional approaches to software reuse. Current software 
developmen t methods do not promote design reuse. A design mapping method from an 
object-oriented requirements analysis to a design adhering to thes e principles is 
presented. The method involves tw o transformation steps and introduces four 
representation tools for conducting the transformations. The second step produces Ad ... 
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Architectural simulation has achieved a prominent role in the system design cycle by 
providing designers the ability to quickly examine a wide variety of design choices. 
However, the recent trend in system design toward architectures that react to circuit-level 
phenomena has outstripped the capabilities of traditional cycle-based architectural 
simulators. In this paper, we present an architectural simulator design that incorporates a 
circuit modeling capability, permitting architectural-level si ... 
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Global Virtual Time (GVT) is the fundamental synchronization concept in optimistic 
simulations. It is defined as the earliest time tag within the set of unprocessed pending 
events in distributed simulation. A number of techniques for determining GVT have been 
proposed in recent years, each having their own intrinsic properties. However, most of 
these techniques either focus on specific types of simulation problems or assume specific 
hardware support. This paper specifically addresses the GV ... 

Keywords: GVT computation, SPEEDES GVT, SPEEDES framework. Synchronous Parallel 
Environment for Emulation and Discrete-Event Simulation framework, digital simulation, 
distributed simulation, distributed synchronization, efficiency, event processing, flow 
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POWERS offers significantly increased performance over previous POWER designs by 
incorporating simultaneous multithreading, an enhanced memory subsystem, and 
extensive RAS and power management support. The 276M transistor processor is 
implemented in 130nm silicon-on-insulator technology with 8-level of Cu metallization and 
operates at >1.5 GHz. 

Keywords: POWERS, clock gating, microprocessor design, power reduction, simultaneous 
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Prefetching is often used to overlap memory latency with computation for array-based 
applications. However, prefetching for pointer-intensive applications remains a challenge 
because of the irregular memory access pattern and pointer-chasing problem. In this 
paper, we proposed a cooperative hardware/software prefetching framework, the push 
architecture, which is designed specifically for linked data structures. The push 
architecture exploits program structure for future address generation instea ... 

Keywords: Prefetch, linked data structures, memory hierarchy, pointer-chasing 



^ The.MrM3chine.muJticg^^^ H 
Marco Fillo, Stephen W. keckler, William J. Dally, Nicholas P. Carter, Andrew Chang, Yevgeny 
Gurevich, Whay S. Lee 

December 1995 Proceedings of the 28th annual international symposium on 

Microarchitecture 
Publisher: IEEE Computer Society Press 

Full text available: 'M.pdfI1,29„MBJ Additional Infornnation: MLcjtatJon, references, citings, indexjerrris 



6 High Efficiency Counter Mode Security Architecture via Prediction and H 
Precomputatjon 

Weidong Shi, Hsien-Hsin S. Lee, Mrinmoy Ghosh, Chenghuai Lu, Alexandra Boldyreva 
June 2005 Proceedings of the 32nd Annual International Symposium on Computer 

Architecture ISCA '05 
Publisher: IEEE Computer Society 
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Encrypting data in unprotected memory has gained much interest lately for digital rights 
protection and security reasons. Counter Mode is a well-known encryption scheme. It is a 
symmetric-key encryption scheme based on any block cipher, e.g. AES. The schemeys 
encryption algorithm uses a block cipher, a secret key and a counter (or a sequence 
number) to generate an encryption pad which is XORed with the data stored in memory. 
Like other memory encryption schemes, this method suffers from the inhe ... 
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This paper describes a method for improving the performance of a large direct-mapped 
cache by reducing the number of conflict misses. Our solution consists of two 
components: an inexpensive hardware device called a Cache Miss Lookaside (CML) buffer 
that detects conflicts by recording and summarizing a history of cache misses, and a 
software policy within the operating system's virtual memory system that removes 
conflicts by dynamically remapping pages whenever large numbers of conflict miss ... 
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A prerequisite of energy-aware scheduling is precise knowledge of any activity inside the 
computer system. Embedded hardware monitors (e.g., processor performance counters) 
have proved to offer valuable information in the field of performance analysis. The same 
approach can be applied to investigate the energy usage patterns of individual threads. 
We use information about active hardware units (e.g., integer/floating-point unit, 
cache/ memory interface) gathered by event counters to establish at... 



^ MltigMin3.AmdaM!sXa^^^ 

Murali Annavaram, Ed Grochowski, John Shen 
W\ May 2005 ACi^ SIGARCH Computer Architecture News , Proceedings of the 32nd 
Annual International Symposium on Computer Architecture ISCA '05, 

Volume 33 Issue 2 

Publisher: IEEE Computer Society, ACM Press 

Full text available: '^pd£202,M.KB} Additional Information: felLcitatioa abstract 

This paper is motivated by three recent trends in computer design. First, chip multi- 
processors (CMPs) with increasing numbers of CPU cores per chip are becoming common. 
Second, multi-threaded software that can take advantage of CMPs will soon become 
prevalent. Due to the nature of the algorithms, these multi-threaded programs inherently 
will have phases of sequential execution; Amdahlys law dictates that the speedup of such 
parallel programs will be limited by the sequential portion of the comp ... 
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High power consumption and low energy efficiency have become significant impediments 
to future performance improvements in modern microprocessors. This paper contributes 
to the solution of these problems by presenting: linear regression models for power 
consumption and a detailed study of energy efficiency in a modern out-of-order 
superscalar microprocessor. These simple (2-input) yet accurate (2.6% error) models 
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Energy efficiency is becoming an increasingly important feature for both mobile and high- 
performance server systems. Most processors designed today include power management 
features that provide processor operating points which can be used in power management 
algorithms. However, existing power management algorithms implicitly assume that lower 
performance points are more energy efficient than higher performance points. Our 
empirical observations Indicate that for many systems, this assumption 1 ... 
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This paper proposes a novel technique for power-performance trade-off based on profile- 
driven code execution. Specifically, we show that there is an optimal level of parallelism 
for energy consumption and propose a compiler-assisted technique for code annotation 
that can be used at run-time to adaptively trade-off power and performance. As shown by 
experimental results, our approach is up to 23% better than clock throttling and is as 
efficient as voltage scaling (up to 10% better in some ca ... 
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